Heuristic-based Baseline Removal Algorithm for SELDI Proteomics Data
نویسندگان
چکیده
Motivation: Baseline removal, as the first preprocessing step of SELDI data, critically influences subsequent analysis steps. Current baseline removal algorithms of SELDI data, which are based on mathematical morphology, result in biased signal estimates. Due to the parameterization of current algorithms for baseline removal, noise and spectral signal distributions bias the removal results, which may lead to seemingly interesting but ultimately irreproducible results in downstream analysis. Results: We proposed a Heuristic Based Baseline Removal (HbBr) algorithm to model the baseline. HbBr first identifies the potential peak regions by utilizing first derivatives and amplitudes information as a fast heuristic, then down-weights peak regions before modeling the baseline with a nonparametric smooth curve. It outperformed mathematical morphology-based algorithms implemented in the PROcess package of Bioconductor, as judged by a series of benchmark experimental data sets and simulated data sets. We also found that HbBr is computationally more efficient than PROcess. Furthermore, we demonstrated that the HbBr algorithm, although designed for SELDI, yields a good baseline correction of MALDI data without adjusting any parameters. Availability: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project. http://basic.northwestern.edu/publications/baseline/ Contact: Simon M. Lin ([email protected]), Pan Du ([email protected]), and Warren A. Kibbe ([email protected]).
منابع مشابه
Quantitative quality-assessment techniques to compare fractionation and depletion methods in SELDI-TOF mass spectrometry experiments
MOTIVATION Mass spectrometry (MS), such as the surface-enhanced laser desorption and ionization time-of-flight (SELDI-TOF) MS, provides a potentially promising proteomic technology for biomarker discovery. An important matter for such a technology to be used routinely is its reproducibility. It is of significant interest to develop quantitative measures to evaluate the quality and reliability o...
متن کاملExtended Study in Computing Science 20p Pattern Recognition Algorithms for Cancer Detection applied on data from SELDI Technology
The SELDI technology is a technique that measures the content of different proteins in blood samples from patients. Many research groups have shown that there appears to exist a relation between the concentrations of speci c proteins and cancer disease. The output from the SELDI system is a result of mass-spectrometry and is a spectrum containing the concentrations of thousands of separate prot...
متن کاملA new genetic algorithm in proteomics: Feature selection for SELDI-TOF data
Mass spectrometry from clinical specimens is used in order to identify biomarkers in a diagnosis. Thus, a reliable method for both feature selection and classification is required. A novel method is proposed to find biomarkers in SELDI-TOF in order to perform robust classification.The feature selection is based on a new genetic algorithm. Concerning the classification, a method which takes into...
متن کاملMass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet
MOTIVATION Peaks are the key information in mass spectrometry (MS) which has been increasingly used to discover diseases-related proteomic patterns. Peak detection is an essential step for MS-based proteomic data analysis. Recently, several peak detection algorithms have been proposed. However, in these algorithms, there are three major deficiencies: (i) because the noise is often removed, the ...
متن کاملApplication of the random forest classification algorithm to a SELDI-TOF proteomics study in the setting of a cancer prevention trial.
A thorough discussion of the random forest (RF) algorithm as it relates to a SELDI-TOF proteomics study is presented, with special emphasis on its application for cancer prevention: specifically, what makes it an efficient, yet reliable classifier, and what makes it optimal among the many available approaches. The main body of the paper treats the particulars of how to successfully apply the RF...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006